Two less addressed issues of deep reinforcement learning are (1) lack ofgeneralization capability to new target goals, and (2) data inefficiency i.e.,the model requires several (and often costly) episodes of trial and error toconverge, which makes it impractical to be applied to real-world scenarios. Inthis paper, we address these two issues and apply our model to the task oftarget-driven visual navigation. To address the first issue, we propose anactor-critic model whose policy is a function of the goal as well as thecurrent state, which allows to better generalize. To address the second issue,we propose AI2-THOR framework, which provides an environment with high-quality3D scenes and physics engine. Our framework enables agents to take actions andinteract with objects. Hence, we can collect a huge number of training samplesefficiently. We show that our proposed method (1) converges faster than thestate-of-the-art deep reinforcement learning methods, (2) generalizes acrosstargets and across scenes, (3) generalizes to a real robot scenario with asmall amount of fine-tuning (although the model is trained in simulation), (4)is end-to-end trainable and does not need feature engineering, feature matchingbetween frames or 3D reconstruction of the environment. The supplementary video can be accessed at the following link:https://youtu.be/SmBxMDiOrvs.
展开▼